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O ' Abstract 

(N ' 

The problem of compressing a real-valued sparse source using compressive sensing 
techniques is studied. The rate distortion optimality of a coding scheme in which 
compressively sensed signals are quantized and then reconstructed is established when 
the reconstruction is also required to be sparse. The result holds in general when the 
distortion constraint is on the expected p-norm of error between the source and the 
reconstruction. A new restricted isometry like property is introduced for this purpose 
and the existence of matrices that satisfy this property is shown. 
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1. Introduction 



> 

; In recent years, there has been an explosion in the number of applications to which compressed 

(or compressive) sensing has been applied HI, 0. From image and video capture to microarrays 
and other applications represents just a sub-spectrum of its possible uses. An important applica- 
tion that seems particularly promising in terms of being translated into practical circuit designs is 
quantization. If the original signal to be compressed is n dimensional but is fc-sparse (i.e., has at 
most k non-zero entries), and k « n, then there is a significant benefit in using a compressive 
sensing framework for quantization. Indeed, compressive sensing in itself represents a nearly- 
lossless linear transformation on the original source, and thus, compressive sensing is a "good 
lossless" compression mechanism for sparse vectors, reducing the length of the representation of 
the original source from an n dimensional vector to m — 0(/clog |) dimensional vector, which 
is an order-wise optimal lossless compression of the source. In particular, "sampling" matrices 
$ of dimension m x n have been shown to exist where 

y m = $x n , (l) 

such that the original source x n can be recovered losslessly with high probability. This is not 
surprising from a compression perspective, as optimal linear compressors are known to exist for 
lossless compression. 

Our goal in this paper is to investigate if compressive sensing is good for lossy compression 
of continuous-valued sources. The answer to this question is not immediately obvious, as typical 
lossy compression algorithms involve non-linear transformations between the source and its 
compressed equivalent in the encoding step. There are two ways in which compressive sensing 
can be combined with quantization. The first is where the number of samples m in (OQ) is reduced 
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Figure 1 . Compressive sensing followed by quantization 



from Q(k log ~) to an orderwise smaller quantity. The resulting lossy-compressed vector y is then 
"inverted" to obtain a compressed version of x. This approach is studied in detail for the case 
when k = O(n) in J3J. The second is to maintain the lossless compression process as prescribed 
by (OQ) and then use a (possibly non-linear) quantizer on y m to obtain y m . Subsequently, y m 
is transformed using a suitable algorithm to obtain x n , a lossy-compressed version of x n . This 
second approach has been studied recently in 01, and it forms the main quantization scheme 
studied in this paper. 

The chain comprising of sampling using compressive sensing, then quantization and finally, 
reconstruction of the compressed vector is depicted in Fig. [Q Associated with this framework 
are two notions of rate, the sampling rate and the compression rate. While the sampling rate is 
the rate at which the quantizer needs to the sample the incoming signal, the compression rate is 
the number of bits per symbol needed to represent the sampled signal within a fidelity criterion. 
This chain is particularly useful in developing A/D converters for sparse sources - it reduces the 
sampling rate at which they must operate thus making their design simpler and the quantization 
operation more effective. What we desire to know in this paper is if this quantization mechanism, 
besides being practically efficient, is indeed optimal. In other words, if the source were to be 
directly quantized (using the best quantizer available), would it suffer a lower distortion than 
being first filtered in accordance with (OQ) and then compressed? Observe that the compression rate 
at which the quantizer in Fig. 1 operates is higher than the optimal quantizer so that the number of 
codeword indices (or the cardinality of the reconstruction alphabet) is kept equal. Mathematically, 
the compression rate of the quantizer in Fig.l is log(2 nH )/m, while the optimal quantizer operates 
at a compression rate of R. The result of this paper may also be interpreted as trading off 
compression rate for sampling rate while still achieving the same distortion performance as the 
optimal quantizer. While there is prior literature in studying the performance of specific designs 
of Fig. 1 (5]|, H, 0, we prove a conclusive result on when the framework is optimal. 

In this paper, our focus is on those quantization applications where we desire the support of 
x n and x n to be identical. This is especially important in applications where we desire that the 
quantization process not introduce "spurious" signals. In sensing systems and other applications 
where the signal represents a change in state of the system, it is particularly important that 
the compression process retain the original (sparse) support of the original. A distortion in the 
sparsity pattern could lead to false activation resulting in undesirable consequences. The problem 
also has applications in DNA microarrays for cancer diagnosis, where a wrong sparsity pattern 
could lead to faulty diagnosis. 

Our main results are as follows: 

1) The coding architecture in Fig. Q] is distortion-rate optimal when the reconstruction is also 
required to be sparse. 

2) We show this result when the distortion constraint is on any p-norm of the error between 



the source and reconstruction sequence, where p > 1. 

3) In order to prove such a general result, we study a modified restricted isometry property 
(RIP) for matrices and show the existence of matrices that satisfy this property. 

The modified RIP introduced in this paper is essential in order to prove the distortion rate 
optimality for p-norm distortion measures. The proof of existence of matrices satisfying the 
modified RIP uses Hoeffding's inequality. Related to this work is the use of Hoeffding's inequality 
to obtain RIP bounds in a recent paper [8J. Also related are results on heavy tailed restricted 
isometries in [|9l and RIP using tail bounds in IfTOl . While Hoeffding's inequality has been 
previously used in other contexts to obtain RIP bounds, this paper uses it to prove the modified 
RIP required for p-norm compression. 

The rest of this paper is organized as follows. In the next section, we describe the system 
model. In Section [31 we state the main results of the paper. We conclude the paper in Section 
HI The proofs of the results are detailed in the appendix. 



2. System Model 



Consider the set of all A;-sparse vectors of length n where each non-zero entry takes 
any value in R. The goal is to compress the sparse and real-valued x n to a vector x n within 
a distortion D. Note that in general, the rate distortion optimal quantizer does not ensure the 
reconstruction, x n , is sparse. Since we desire the support of X n and X n be identical, we let X n 
belong to a A;-sparse reconstruction space denoted by X£. Let T C {1, 2, ... , n}, be the indices 
such that Xi ^ for i E T. Observe that T is a random set on account of X n being a random 
vector. Let X n {T) be the vector with components corresponding to indices in T. X n {T c ) is 
defined in a similar fashion. 

We begin by defining the distortion rate function of the optimal quantizer. Let D x (R) be the 
average distortion achieved by a code operating at rate R for /c-sparse source vectors distributed 
according to X n ~ px^ ■ Mathematically, 



Di(fl) =inf-E 

%n n 



\X n -X n \ 



subject to \X%\ < 2 nH and X n (T c ) = X n {T c ). 

We wish to point out out that the equality constraint in the optimization problem limits the 
reconstruction spaces to those that are /c-sparse with the same sparsity pattern as the source. 

We now define the optimization problem concerning the quantizer in Fig. 1 followed by the 
distortion rate function of the compression scheme in Fig. 1. Let Y m = $X n and let Y m = $X™ 
denote the quantized version of Y m . Let $j be the i-th row of a matrix $ of dimension m x n, 
i = 1, 2, ... , m and q > 1 satisfy - + - — 1. Define, 
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subject to \Xg\ < 2 nH and X n (T c ) = X n (T c ). 

The quantity defined above represents the distortion achieved in Y m corresponding to a particular 
distortion metric. Since, we force the quantizer to search over quantized versions of the form 
Y m = the optimization is again carried out over the reconstruction spaces of the form X£. 



Note that we depart from the usual convention of denoting the compression rate as the argument 
of the distortion rate function in the definition of A 2 (R). The argument R in A 2 (i?) denotes the 
fact that 2 nR indices are used for quantization and the compression rate is in fact log(2 ni? )/m. 

Let D 2 (R) be the average distortion achieved in X n by the scheme consisting of compressive 
sensing followed by quantization and reconstruction. The chain shown in Fig. 1 uses 2 nR 
codewords at a compression rate R. In the next section, we present our main result relating 
D 2 (R) and D^R). 

3. Main Results and Analysis 

We first briefly discuss the order wise optimality of compressed sensing for lossless compres- 
sion before turning to the main results of this paper. Let us assume for this discussion alone 
that the non-zero entries of X n are discrete random variables belonging to some alphabet X 
with finite cardinality. Let T denote the sparsity pattern of X n , uniformly distributed among (™) 
possiblities. Mathematically, T — {i : Xi ^ 0}. If R is the rate of compression, we have, 



Here (a) follows from Fano's inequality where e n — > as n — > oo. Therefore compressed 
sensing is an order wise optimal lossless compression scheme. 

We now turn to the main results of the paper. We state the restricted isometry property (RIP) 
for matrices ifTTIl . A matrix $, is said to satisfy the (e, A;)-RIP if Vx n G Xg, 



We now state a modified version of the RIP which is useful in proving the rate distortion result 
of the paper when the distortion constraint is on any p-norm on the error between the source 
and the reconstruction. A matrix $, is said to satisfy the modified (e, A;)-RIP if \/x n G Xg, and 
P> 1, 



We show the existence of matrices satisfying the above modified RIP through the following 
theorem. 

Theorem 1. Let $ be a matrix of dimension mxn containing entries chosen i.i.d. and supported 
in [C\, C 2 ], where — oo < C\ < C 2 < oo. For every e G (0, 1), if m = 0(Hog |), there exists a 
constant c 2 > such that with probability greater than 1 — 2e~ mC2 , 



nR > H(X n ) = I(X n ; X n ) = H(X n ) - H(X n \X n ) 

> H(T,Xi,ieT)-ne n 

= H(T) + H(X t ,ieT\T)-ne n 

Tl 

= fclog — + kloge + Hog \X\ — ne n . 



(i-*i 2 



< H^lh < (1 + e)\\x 



(2) 




(3) 
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The above theorem is proved in the appendix. We now show that there exist matrices that 
satisfy both the RIP and the modified RIP simultaneously with high probability through the 
following lemma. Such a result will be useful in proving the main result of the paper. 

Lemma 1. If m = Q(k\ogn/k), then there exists an m x n matrix $ that satisfies the RIP (0) 
and the modified RIP (TJ]) with high probability. 

Proof: Let the entries of $ be i.i.d. and distributed according to a Bernoulli distribution 
taking values 1 / \fm or —1/ y/m with equal probability. Then, it follows from ifTTI that $ satisfies 
the RIP ©. Further, using Theorem [H it follows that $ also satisfies the modified RIP © with 
high probability since the entries of $ are i.i.d. and belong to [—1,1]. Therefore, $ satisfies both 
© and © with high probability. □ 
The following theorem, which is the main result of the paper states that the coding architecture 
of Fig. \T\ achieves the same distortion rate function as the optimal compression scheme for the 
fc-sparse source. 

Theorem 2. The coding architecture in Fig. 1 is distortion rate optimal when $ satisfies the 
RIP (|2|) and the modified RIP (TJ]). Mathematically, Ve G (0,1), with high probability, 

D 2 {R) < ^-Di{R). 

Proof: By Lemma Q3 there exists a $ that satisfies the RIP © and the modified RIP ® 
simultaneously. A candidate code for the quantization of Y m can be described as follows. The 
optimal codebook for X n is multiplied by $ to obtain a codebook for Y m . Now, given a Y m , 
the quantizer looks for that $X n that minimizes the average distortion. Since $ satisfies the 
modified RIP with high probability, we have 
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[l-e)D 2 (R)<A 2 {R) 



(4) 



Now, V5 > 0, let X n _be the quantized value of X n according to the optimal quantizer that 
achieves Di(R) + 5. $X n is a feasible solution to the problem A 2 (R). Therefore, again by the 
modified RIP, with high probability, 



A 2 (R)<-Y-E 



\Y - §iX n \ 



i\\q 



<i(l 

n 



e)\\X n -X n \\ p <{l + e){D 1 {R) + 5). (5) 



Now, since A 2 {R) < (1 + e){D 1 {R) + 5) for all 5 > 0, we get 

A 2 (i?) < (l + e)D 1 (R). 

From Equation © and ©, we get 

D 2 (R) < \±±D X {R). 



We claim that the scheme in Fig. 1 achieves the optimal distortion rate function since D 2 (R) < 
j^Dx(R) for all e £ (0, 1). However, note that as e — > 0, we require more and more number of 
measurements to satisfy the RIP and modified RIP with high probability. Also, forcing Y m = 
$X n where X n is sparse implies that recovery of the sparse vector is possible without any loss. 
In other words, X n may be exactly recovered from Y m since $ satisfies the RIP as well. □ 
For the specific case of 2-norm distortion measures, the above theorem can be proved for 
matrices $ that just satisfy the RIP alone. Define 



D\{R) = inf-E 



x k n 



\X n - X r 



subject to \X£\ < 2 nR and X n {T c ) = X n {T c 



and 



Al(R) =inf — E 



x» n 



\Y n - $X n | 



subject to \X%\ < 2 nR and X n {T c ) = X n {T c ). 

Let -D|(i?) be the distortion rate function achieved by the coding architecture of Fig. 1. 

Remark 1. The coding architecture of Fig. 1 is distortion rate optimal when $ satisfies the RIP 
([2]). Mathematically, Ve £ (0, 1), with high probability, 



Dl{R) < 



1 



-D{{R). 



The proof is similar to that of Theorem 2, where only the RIP is used instead of the modified 
RIP in steps © and ©. 



4. Conclusion 



We consider the problem of quantization of sparse signals using compressive sensing. We 
show that the chain comprising of compressive sampling followed by quantization and then 
reconstruction is rate distortion optimal when the reconstruction is also required to be sparse. 
The result is shown when the distortion metric is any p-norm, p > 1, on the error between the 
source and the reconstruction. The proof of the result requires the compressive sensing matrix to 
satisfy a new modified restricted isometry property and we also prove the existence of matrices 
satisfying this property. 



Appendix 

Theorem Q] is proved through the sequence of the following lemmas. The overall procedure 
closely follows the technique in ifTOl with suitable changes as required. We first state a lemma 
about the concentration of measure around the mean for bounded random variables. Let Z = 
\Yi \ + |^2| + ■ • ■ + \ Y m \, where Yi, i = 1, 2, . . . , m, are independent bounded continuous random 
variables such that \Yi\ < C for each i. Also, let E[Z] = (3m. 

Lemma 2. For e £ (0, 1), the random variable Z satisfies 

P [(1 - e)l3m < Z < (1 + e)Pm] > 1 - 2e~ m ^\ 



I 2 1 

■* J' 



with 7(e) > 0. 

Proof: Following the procedure in IfTOl , we use the inequality e~ a < 1 — a + ^ for all 
a > to get 

E [e~ A|yi1 ] < 1 - XE[\Yi\] + A 2 E[|y 4 | 
for A > 0. Further, since 1 — a < e~ a for all a 6 1, we have 

E j-g-Al^lj < e -(AE[|^|]-A 2 E[|n| 2 ])_ 

Therefore, by Markov's inequality, we obtain, 

W[-Z > -(1 - e)/3m] = P [e~ xz > e^ 1 ^™] 

< e xil -^ m E [e~ xz ] 



e A(l-e)/3m E 



< g-AfEHiEII^O-ma-^-AE^IEOK^]] 



< e 



-mA[ £ /3-AE™iIE[|^| 2 ]H 



For the other side of the inequality, we use Hoeffding's inequality as follows. Now, 

F[Z > (1 + e)Pm] = P [e xz > e A(1+e)/3m ] 

< e" A ( 1+ ^ m E [e xz ] 

< e -A(l+6)/9m LA 2 C 2 /8l m 
_ e -mA[(l+e)/3-AC 2 /8] 

where we use Markov's inequality in the second step and Hoeffding's inequality in the third 
step. Choosing A < min j ^^Ep^] ' ^"c^ }' we § et 

P [(1 - e)/3m < Z < (1 - e)/3m) > 1 - 2e" m7(e) , 

with 7(e) = Amin|e/3- A^M2iQ,(l + e )/3- A^} > 0. □ 
The following lemma states that for every given x n , there exists a matrix that satisfies the 
condition in the modified isometry property with high probability. Note that this does not prove 
the statement of the theorem yet since we need to show the existence of a matrix that satisfies 
the condition for all x n £ X£. 

Lemma 3. For every given x n , and e £ (0, 1), an m x n matrix $ with i.i.d. entries supported 
in [Ci,C<2], where —00 < C\ < C2 < 00, satisfies 



P 



' J nry-i nn'fb 



<f>a n \ 



1 m \\X' L \\ p \\<&i\\ q 



> £ 



< 2e" m7(e) , 



where 7(e) > and - + - — 1, p, q > 1. 



Proof: Since each entry of $ is i.i.d. and bounded in [Ci,C?\, by Holder's inequality, we 



have 



\x' 
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where q satisfies - + - = 1. Therefore, the random variable K = ,, ^ffin Jl satisfies |K| < 1, 
for % = 1, 2, . . . , m. Let Y^Li E i\ Y i\] = m P- lt follows that p < 1, since 



E 



X' 



pii ^iig J 



< E 



1 119 



X'' 



pll ^«ll<2 



1. 



By applying Lemma [2l we get 

P 



P 



l-e)P< f2-\Yi\<(l + e)(3 

i=l 

£_|^l<(l + e) 



e < 



> 1 - 2e" m7(e) 

> 1 -2e" m7(e) , 



since /3 < 1. The desired result follows from the above. □ 
We now present a lemma about the quantization of vectors in the unit p-norm ball. We 
characterize the size of the set required to represent every such vector within a prescribed p- 
norm error. 

Lemma 4. For e E (0, 1), there exists a finite set Qcl" such that \Q\ < J-^z (^-) n and 



sup min||x n — v n \\ p < e. 

x":\\x"\\ p <l V "£Q 

Proof: For k 6 N, a natural number, define 

Q' = {x n : Xi = — for some j G {—k, —k + 1,..., k}}. 

k 

Q' is a set of quantization indices in n dimensions with size (2k + l) n . We now define 
Q = Q' fl B p (l), where B p (l) is the unit ball in W 1 with L p norm as the distance metric. The 
size of Q, is then the ratio of the volumes of the unit ball B p (l) to the unit cube times the size 
of Q'. The volume of B p (l) is given by 



Vol(B„(l)) 



Y l' p+n 
P 



Therefore, 



I si < (2k + iy 



r ( p+ n 

p 



We now specify the choice of k that bounds \Q\ as required. Let v n be the quantization 
index for x n . Mathematically, for i = 1,2, ...,n, we choose, Vi = Sign(xi) Ll^fel . Therefore, 
— Ui| < and 



k 



Choosing k = \n 1 l pn \/e, we satisfy \\x n — v n \\ p < e, and obtain 



\Q\ < (3[n 1/p l/e) 



p+i 
p 



P+n 
P 



Now, 



r(l + n/p) > 



27rp / n 



n yep 



n/p 



and T 



p+i 
p 



< 1. Therefore 



\Q\ < 



Qn n n/p I ~ /gpx n/p 



< 



n / ci \ n 



27rp V e 



27rp V n 

where oy = Q(ep) 1 ^ p . □ 
We now prove Theorem 1. $ is a matrix of dimension m x n containing entries chosen 
i.i.d. and supported in [Ci,Cy, where — oo < C\ < C 2 < oo. We need to show that for every 
e G (0, 1), if m — 0(fclogjr), there exists a constant C2 > such that with probability greater 
than 1 — 2e~ mc ' 2 , we have 



sup 



m 1 

E 1 



|$jX n | 



x m \\X n \\p\\<bi\\q 



< e. 



Without loss of generality, we consider x n such that \\x n \\ p = 1. We first prove that for the 
set of /c-sparse vectors x n with a given sparsity pattern, $ exists with probability greater than 

1 - 2(4 Cl /e) fc e- m7 . 

By Lemma |H there exists a set Q with size \Q\ < ^/^(8ci/e) fc , such that 

sup min||x n - v n \\ < e/8. 



Now, we show that there exists a matrix $ such that 



sup 

v n £Q 



m 1 

E 1 

f J nm 



|$,-17 n | 



! ™lHU$i||« 



1 



< e/2 



(6) 



with high probability. By Lemma [3] and union bounding argument, we have, 

\<$>iV n \ 



sup 



E 1 



i ™lMlpll$ill<z 



> e/2 



< |Q|2e- m7(£/2) < 2^ 



27TJ9 



ci/6) fe e 



We now prove the statement of the theorem. By Holder's inequality, 

|$,-x n | 



E 1 



1 m ||a; rt ||p||$i|| 9 



< 1. 



Thus, T.Z-1 - 1, S'fJ I, < 1 + e. Now, for the other side, using ©, 

' " t — -L 71% \\X n ~r ? \ \ n 



eV 



> 



E 



i=l 



1 |$^ n | 



|$i(x n - w n ) 
\\p\\$i\\ q 



> (1 - e/2)(l - e/8) - e/8 > 1 - e. 



Now, considering all (") < (en/k) k , k sparse vectors, the probability that there does not exist 
$ satisfying the modified RIP is upper bounded by 

2(en/k) k \ — (8ci/e) fc e" m7(e/2) < e ^og(en/k)+io S (8c 1 /e)]+^\o g ^-m 7 (e/2)+\og2_ 
Therefore, if 

1 / Ik 

m > fcflog(era/ k) + log(8ci/ e)] + - log h log 2 

7(e/2) V 2 2?rp 

there exists c 2 > 0, chosen smaller than 7(e/2) — ^ ^A;[log(eri/&;) + log(8c 1( /e)] + | log ^ + log2^, 
such that probability that there does not exist a matrix satisfying the p-norm condition is upper 
bounded by 2e~ mC2 . 
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